Optimization of protein force-field parameters with the Protein Data Bank 



Yoshitake Sakae" and Yuko Okamoto"''' 



"■Department of Functional Molecular Science 
The Graduate University for Advanced Studies 
Okazaki, Aichi 444-8585, Japan 
and 

^Department of Theoretical Studies 
Institute for Molecular Science 
Okazaki, Aichi 444-8585, Japan 



Abstract 

We propose a novel method to optimize existing force-field parameters for protein 
systems. The method consists of minimizing the summation of the square of the force 
acting on each atom in the proteins with the structures from the Protein Data Bank. 
We performed this optimization to the partial-charge and torsion-energy parameters of 
the AMBER parm96 force field, using 100 molecules from the Protein Data Bank. We 
then performed folding simulations of a-helical and /3-hairpin peptides. The optimized 
force-field parameters gave structures more similar to the experimental implications than 
the original AMBER force field. 

1 Introduction 

In the theoretical studies of biomolecular systems, it is now indispensable to use molecular 
simulation techniques such as Monte Carlo (MC) and molecular dynamics (MD) meth- 
ods. Most such simulations use the potential energy functions within the framework of 
classical mechanics consisting of certain energy terms with force-field parameters. Well- 
known force fields (or potential energy functions) are AMBER CHARMM |3], 
OPLS [SlIE], GROMOS m, and ECEPP (Hj. These force fields have been parameterized 
to fit experimental data of small molecules and, for some terms, quantum chemistry cal- 
culations. In fact, it is rather difficult to test the validity of these force-field parameters, 
because simulations of biomoleuclar systems will be hampered by the multiple-minima 
problem; the simulations will get trapped in states of energy local minima and will yield 
unreliable results. Hence, it is essential to employ powerful simulation algorithms such as 
generalized-ensemble algorithms (for a recent review, see Ref. jH]) in order to have accurate 
comparisons of force fields. For instance, we have recently carried out detailed compar- 
isons of three versions (parm94 [T], parm96 0, and parmQQ 0) of AMBER, CHARMM 
jlj, OPLS-AA/L P, and GROMOS by generalized-ensemble simulations of small pep- 
tides [HI!. 
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If any problems are found in the force fields, we have to improve the force-field pa- 
rameters by some means. In this Letter, we propose a novel method for optimization of 
force-field parameters. Our method utilizes the protein structures in the Protein Data 
Bank (PDB) j23], in which the number of entries is rapidly increasing. Actually, there 
already exist many works of similar knowledge-based methods of force-field development 
^5-|22]- Our method is different from most of previous works in two points: we deal with 
all-atom force fields such as AMBER, and we use only the PDB data without introducing 
so-called decoys (or, misfolded structures). 

In section 2 the details of the formulation of our method are described. In section 3 an 
example of the application of our method is presented. Conclusions and future prospects 
are presented in section 4. 

2 Methods 

The existing all-atom force fields for protein systems such as AMBER and CHARMM use 
essentially the same functional forms for the conformational potential energy -Econf except 
for minor differences. -Econf can be written as, for instance, 

-Econf = EbL + EbA + -^torsion + -^nonbond , (1) 

Ebl= E Ke{i-ee^)\ (2) 

bond length i 

Eba= E MO-Oc^Y, (3) 

bond angle 8 

^torsion = E E + COs(n$ - 7„)] , (4) 
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dihedral angle <I> ^ 



-^nonbond ^ ^ 



-^ij B{j QiQj 



12 fi ' 

i<j I'ij ' ij "'^JJ 



(5) 



Here, E^l, E^a, and -^torsion represent the bond-stretching term, the bond-bending 
term, and the torsion-energy term, respectively. The bond-stretching and bond-bending 
energies are given by harmonic terms with the force constants Ki and Kg, and the equi- 
librium positions, £eq and ^eq- The torsion energy is, on the other hand, described by the 
Fourier series in Eq. where the sum is taken over all dihedral angles $, n is the num- 
ber of waves, 7„ is the phase, and Vn is the Fourier coefficient. The nonbonded energy in 
Eq. is represented by the Lennard- Jones and Coulomb terms between pairs of atoms, 
i and j, separated by the distance Vij. The parameters Aij and Bij in Eq. ^ are the 
coefficients for the Lennard- Jones term, qi is the partial charge of the i-th atom in the 
electrostatic term, and e is the dielectric constant, where we set e = 1. Hence, we have 
five classes of force-field parameters, namely, those in the bond-stretching term {Ki and 
ieq), those in the bond-bending term {Kg and 6'eq), those in the torsion term (Vn and 7^), 
those in the Lennard- Jones term [Aij and Bij), and those in the electrostatic term (qi). 

We now describe our new method for optimizing these force-field parameters. We first 
select N molecules in PDB. We try to choose proteins from different folds (such as all 
a-helix, all /3-sheet, a/j3, etc.) and different homology classes as much as possible. If the 



2 



force-field parameters are of ideal values, we expect that all the chosen native structures 
are stable without any force acting on each atom in the molecules. Hence, we expect 



F = , (6) 

where 



m=l ^^m 

and 

L = . (8) 

Here, Nm is the total number of atoms in molecule m, i?tot is the total potential energy, 
and fi is the force acting on atom i. In reality, F 7^ 0, and because F > 0, we expect 
that we can optimize the force-field parameters by minimizing F with respect to these 
parameters. In practice, we perform a simulation in the force-field parameter space for 
this minimization. 

Proteins are usually in aqueous solution, and hence we also have to incorporate some 
kind of solvent effects. Because the more the total number of proteins (A^) is, the better 
the force-field parameter optimizations are expected to be, we want to minimize our efforts 
in the calculations of the solvent effects. Here, we employ the generalized-Born/surface 
area (GB/SA) terms for the solvent contributions [23 12^1 • Hence, we use in Eq. dSJ 

Ftot = Fconf + Fsolv , (9) 

where 

-f^soiv = -E'gb + E^K , (10) 

^SA = E^fe^fe- (12) 

k 

Namely, in the GB/SA model, the total solvation free energy in Eq. (fTTl|l is given by the 
sum of a solute-solvent electrostatic polarization term, a solvent-solvent cavity term, and a 
solute-solvent van der Waals term. A solute-solvent electrostatic polarization term can be 
calculated by the generalized Born equation (fTTj) . where aij = ^a^aj, ai is the so-called 
Born radius of atom i, Dij = rf^/ (20;,^)^, and is the dielectric constant of bulk water (we 
take = 78.3). A solvent- solvent cavity term and a solute-solvent van der Waals term 
can be approximated by the term that is proportional to the solvent accessible surface 
area in Eq. (fT^ . Here, is the total solvent-accessible surface area of atoms of type k 
and (Tfc is an empirically determined proportionality constant |^ I26j. 

The flowchart of our method for the optimization of force-field parameters is shown 
in Fig. m 

In Step 1 we try to obtain as many structures as possible from PDB. The number 
is limited by the computer power that we have available in our laboratory. We want 
to choose proteins with different sizes (numbers of amino acids), different folds, and 
different homology classes as much as possible. We also want to use only those with high 
experimental resolutions. Note that only atomic coordinates of proteins are extracted 
from PDB (and coordinates from other molecules such as crystal water are neglected). 
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If we use data from X-ray experiments, hydrogen atoms are missing, and thus in Step 2 
we have to add hydrogen coordinates. Many protein simulation software packages provide 
with routines that add hydrogen atoms to the PDB coordinates, and one can use one of 
such routines. 

We now have protein coordinates ready, but usually such "raw data" result in 
very high total potential energy and strong forces will be acting on some of the atoms 
in the molecules. This is because the hydrogen coordinates we added as above are not 
based on experimental results and have rather large uncertainties. The coordinates of 
heavy atoms from PDB also have experimental uncertainties. We take the position that 
we leave the coordinates of heavy atoms as they are in PDB as much as possible, and 
adjust the hydrogen coordinates to adjust this mismatch. This is why we want to include 
as many PDB data as possible with high experimental resolution (so that the effects 
of experimental errors in PDB may be minimal). We thus minimize the total potential 
energy i^tot = -E'conf + -^soiv + -E'constr with respect to the coordinates for each protein 
conformation, where -Econstr is the constraint energy term that is imposed on the heavy 
atoms in PDB (it is referred to as the "predefined constraints" in Steps 3 and 5 in Fig. P): 

-E^constr = E Kxix-Xo)"^. (13) 

heavy atom 

Here, is the force constants of the restriction (in the present work, we use the value 
of 100 kcal/mol/A2), and xq are the original coordinate vectors of heavy atoms in PDB. 
Because we are searching for the nearest local-minimum states, usual minimization rou- 
tines such as conjugate-gradient method and Newton-Raphson method can be employed 
here. As one can see from Eq. (fTBj) . the coordinates of hydrogen atoms will be mainly 
adjusted, but unnatural heavy-atom coordinates will also be modified. We perform this 
minimization for all protein structures separately, and obtain A^ refined structures. 

Given A^ set of "ideal" reference coordinates in Step 3, we now optimize the first set of 
force-field parameters in Step 4. In Eq. we have five classes of force-field parameters 
as mentioned above. Namely, the force-field parameters are those in the bond-stretching 
term {Ki and icq), those in the bond-bending term {Kg and 6'eq), those in the torsion 
term [Vn and 7„), those in the Lennard- Jones term [Aij and Bij), and those in the 
electrostatic term (g^). Because they are of very different nature, we believe that it is 
better to optimize these classes of force- field parameters separately (as in Steps 4, 6, and 
so on in Fig. ^). For each set of force-field parameters, the optimization is carried out 
by minimizing F in Eq. ((7j) with respect to these parameters. Here, E'tot in E<1- dHl) is 
given by Eq. ^ For this purpose usual minimization routines such as conjugate- gradient 
method are not adequate, because we need a global optimization. One can employ more 
powerful methods such as simulated annealing |2S1 and generalized-ensemble algorithms 
jnj. We perform this minimization simulation in the above parameter space to obtain the 
parameter values that give the global minimum of F. 

These processes are repeated until the optimized force-field parameters converge. 

We can, in principle, optimize all the force-field parameters following the flowchart in 
Fig. ^ In the example given in the next section, however, we just optimize two classes 
of the force-field parameters; namely, the partial charges (g^) and the backbone torsion 
energy parameters (Vi, V2, and V3). Here, we fix the phases (7„) and the main-chain 
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torsion energy term is written as (see Eq. (j3))) 

E^=^,^ = y [1 + cos^*^)] + y + - tt)] + ^ [1 + cos(3$)] , (14) 

for each backbone dihedral angle (j) and ip. As for the partial charge optimization, we 
impose a condition that the total charge of each amino acid remains constant, which is 
the usual assumption adopted by force fields of Eqs. (0)^® based on classical mechanics. 

We believe that these two classes of parameters have the most uncertainty among 
all the force-field parameters. This is because partial charges are usually obtained by 
quantum chemistry calculations of an isolated amino acid in vacuum separately, which is 
a very different condition from that in amino acids of proteins in aqueous solution and 
because torsion-energy term is the most problematic (for instance, parm94, parm96, and 
parm99 versions of AMBER differ mainly in torsion-energy parameters). 



3 Results and discussion 

We now present an example of the application of our force-field optimization scheme 
presented in the previous section. The force field that we optimized is the AMBER 
parm96 version 0. We have optimized two sets of parameters. The first set is the partial 
charge parameters (g^ in Eq. ^) and there are 602 such parameters altogether in AMBER. 
The second set is the backbone torsion-energy parameters {Vi, V2, and V3 in Eq. (fT^ ) 
and there are six such parameters (three each for and ip) . We used the program package 
TINKER |2Zj for all the calculations in the present work. 

In Step 1 of the flowchart of Fig. ^ we chose 100 PDB files (A^ = 100) with resolution 
1.8 A or better and with less than 200 residues (the average number of resiudes is 120.4) 
from PISCES |2H1. Their PDB code names are 2LIS, lEPO, ITIF, 1EB6, ICIL, ICCW, 
2PTH, 1I6W, IDBF, IKPF, ILRI, lAAP, 1C75, 1CC8, 1FK5, IKQR, IKIE, ICZP, IGPO, 
IKOI, IIQZ, 3EBX, 1140, lEJG, lAMM, 1107, 1GK8, IGVP, 1M4I, lEYV, 1E29, 1I2T, 
IVCC, IFMO, lEXR, IGUT, 1H4X, IGBS, IBOB, 119L, IIFC, IDLW, lEAJ, IGGZ, 
1JR8, 1RB9, IVAP, IJZG, 1M55, 1EN2, 1C90, 2ERL, lEMV, 1F41, 1EW6, 2TNF, 
IIFR, USE, IKAF, IHZT, IHQK, IFXL, IBKR, IIDO, ILQV, 1G2R, 1KR7, IQTN, 
1D40, lEAZ, 2CY3, lUGI, IIJV, 3VUB, IBZP, IJYR, IDZK, IQFT, lUTG, 2CPG, 
1I6W, 1C7K, 1180, 1L07, ILNI, lEQO, INDD, 1HD2, 3PYP, 1FD3, 1DK8, IWHI, 
IFAZ, 4FGF, 2MHR, 1JB3, 2MCM, IIGD, 1C5E, and IJIG. 

The minimization for the coordinate refinement in Step 3 of the flowchart was done 
with the constraint of Eq. (fT!^ . In Fig. Efa), a part of the original PDB structure of one of 
the molecules (2LIS) and the corresponding part in the refined structure are superposed. 
As expected, we see good coincidence of heavy-atom coordinates and small deviations 
in the hydrogen coordinates. RMSD (Root Mean Square Distance) (for heavy atoms) 
of these two conformations of the entire molecule (2LIS) is indeed small: 0.027 A (the 
average of all 100 proteins is 0.03 A). 

As also expected, the structure of the raw PDB data (that obtained in Step 2 of the 
flowchart) is not stable in the sense that the force acting on some atoms in the molecule 
will be large, while the refined structure (that obtained in Step 3 of the flowchart) will not 
have abnormally large forces acting. This is illustrated in Fig. |21^b). The average absolute 
value of the components of the force before minimization and after minimization is 9.0 
kcal/mol/A and 1.5 kcal/mol/A, respectively. 
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In Step 4 of the flowchart, we performed the optimization of partial-charge parameters 
by MC simulated annealing. Namely, we minimized F in Eq. ((7j) by MC simulated 
annealing simulations of these parameters (the parameters are updated and the updates 
are accepted or rejected according to the Metropolis criterion). For this we introduced 
an effective "temperature" for the parameter space. Each simulation run consisted of 
50,000 MC sweeps with the temperature decreased exponentially from 200 to 0.01. The 
simulation was repeated 10 times with different initial random numbers. The time series 
of F from one of the simulations is shown in Fig. Efa). We see that F decreases quickly 
in the beginning until about 5,000 MC sweeps and then it decreases very slowly; the total 
number of MC sweeps (50,000) seems sufficient. The optimized partial charges are taken 
from those that resulted in the lowest F value. 

Each term contributing to F (i.e., each component of the force acting on the atoms) 
before and after the optimization is compared in Fig. I2tc). We see that many terms 
decreased in magnitude whole. 

In Tables ^ El and El three examples (alanine, glutamic acid, and tyrosine) of the 
obtained partial charges together with the original AMBER values are listed. The magni- 
tudes of the charges seem to decrease a little in general, although there are exceptions (see, 
for instance, CG and CE of tyrosine). Although the sign of the partial charges remains 
the same for those with large magnitude, charges with small magnitude sometimes change 
their sign (see, for example, CG of glutamic acid and CA, CB, and HB of tyrosine). 

In Step 5 of the flowchart, the original coordinates obtained in Step 2 were again 
minimized with the constraint of Eq. ()1H|) . but this time the optimized parameters were 
used. The average RMSD of 100 proteins is 0.03 A, and the coordinates of heavy atoms 
have little changed. 

In Step 6 of the flowchart, we carried out the optimization of the six torsion-energy 
parameters (Vi, V2, and V3 in Eq. (fT^ for both and ip) by minimizing F in Eq. ((Zj) 
with MC simulated annealing simulations in this parameter space. Each simulation run 
consisted of 10,000 MC sweeps with the temperature decreasing from 1,000 to 1.0. The 
simulation was repeated several times with different random numbers, but the six op- 
timized parameters all agreed with one another. The time series of F from one of the 
simulations is shown in Fig. Efb), and the obtained torsion-energy parameters are listed 
in Table m Each term contributing to F before and after the optimization is compared in 
Fig. |2fd). The term correspoding to the backbone atoms that are relevant to the torsion- 
energy parameters decreased, as expected. Non-zero contributions are presumably to be 
reduced by optimization of other force-field parameters. 

In the present example, we stopped our process in Step 6 of the flowchart and did not 
iterate the optimizations. 

We tested the effectiveness of the new force field by applying it to the folding simu- 
lations of two peptides, C-peptide of ribonuclease A and the C-terminal fragment of the 
Bl domain of streptococcal protein G, which is sometimes referred to as G-peptide j^ . 
The C-peptide has 13 residues and its amino-acid sequence is Lys-Glu-Thr-Ala-Ala-Ala- 
Lys-Phe-Glu-Arg-Gln-His-Met. This peptide has been extensively studied by experiments 
and is known to form an a-helix structure [211 jHOj- Because the charges at peptide ter- 
mini are known to affect helix stability [211 1201! we blocked the termini by a neutral 
COCH3- group and a neutral -NH2 group. The G-peptide has 16 residues and its amino- 
acid sequence is Gly-Glu-Trp-Thr-Tyr-Asp-Asp-Ala-Thr-Lys-Thr-Phe-Thr-Val-Thr-Glu. 
The termini were kept as the usual zwitter ionic states, following the experimental con- 
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ditions jH2l EBl Elj • This peptide is known to form a /3-hairpin structure by experiments 

El 123 Ell- 
Simulated annealing MD simulations of 1 ns were performed for both peptides from 
extended initial conformations. The unit time step was set to 1.0 fs (hence, each run 
consists of 1,000,000 MD steps). The temperature during MD simulations was controlled 
by Berendsen's method [HI]. For each simulation the temperature was decreased from 
2,000 K to 200 K. As for solvent effects, in order to carry out quantitative test, we want 
to include accurate contributions such as explicit water molecules. Here, for simplicity, 
however, we used the GB/SA model in Eqs. (fTU Il - lfT^ was used. For both peptides, these 
folding simulations were repeated 16 times with different initial conformations. As for 
force fields, both the original AMBER parm96 force field and the optimized force field 
were used. 

In Fig. m the lowest-energy conformations of C-peptide obtained from each of the 
folding simulations are shown. They are numbered in increasing order of energy. Almost 
all conformations are «-helix structure in the case of the original AMBER parm96 (see 
Fig. m^a)), although a short /3-hairpin structure is also found in two conformations (Nos. 
12 and 16). RMSD (only Cq, atoms are taken into account) from the corresponding part 
of the native X-ray structure of the entire ribonulcease A was the lowest for conformations 
Nos. 3 (2.47 A), 14 (2.59 A), and 15 (2.81 A). For the optimized force field (see Fig.H^b)), 
five out of 16 conformations (Nos. 2, 7, 8, 9, and 11) are a-helix structure, and /3-hairpin 
structure also appeared in Nos. 12, 14, and 15. Though both a-helix and /9-hairpin 
structures appear with the optimized force field, a-helix is more favored than /5-hairpin 
in the sense that the energy is lower for conformations with a-helix than those with (3- 
hairpin. RMSD was the lowest for conformations Nos. 13 (3.17 A), 16 (3.50 A), and 11 
(3.51 A). Although the original AMBER parm96 gave more conformations closer to the 
native structure (smaller RMSD) than the optimized force field, it seems to favor a-helix 
too much. The experiments imply only 30 % a-helix formations around 0° C [211, which 
is more consistent with the results for the optimized force field. 

In Fig. |31 the lowest-energy conformations of G-peptide obtained from the folding 
simulations are shown. The results for the original AMBER parm96 (see Fig. Efa)), 
are again almost all a-helical. This is in contradiction with the experimental results 
[32| ISni 1^ which imply the formation of a /5-hairpin. RMSD from the corresponding part 
of the native X-ray structure of the entire protein G was the lowest for conformations Nos. 
14 (4.52 A), 13 (5.30 A), and 8 (5.53 A). For the optimized force field (see Fig. Uh)), 
as many as eight out of 16 conformations (Nos. 3, 4, 5, 8, 10, 11, 14, and 15) yielded 
/3-hairpin structure, and a-helix structure also appeared in Nos. 5 and 14. RMSD was 
the lowest for conformations Nos. 7 (3.74 A), 4 (3.84 A), and 11 (4.00 A). Hence, for the 
case of G-peptide, the improvement of the force field by our optimization procedure is 
great (both in secondary structure content and in RMSD). 

4 Conclusions 

In this Letter, we proposed a novel method to optimize the force-field parameters for 
protein systems. Our method is knowledge-based in the sense that we utilize the protein 
structures in the Protein Data Bank. With this method one can optimize not only the 
existing force-field parameters such as AMBER, CHARMM, OPLS, and so on, but also 



7 



any force field as long as its functional form is given. 

We have presented an example of the application of the present method to the original 
AMBER parm96 force field by optimizing the partial-charge and backbone torsion-energy 
parameters. There is still a lot of testing to be done. For instance, other parameter sets 
in AMBER parm96 as well as other force fields should also be tried. We used only 100 
protein structures, and this number should be increased. As for solvent effects in the 
optimization procedure, we employed the generalized- Born/surface area model, but the 
validity of this approximation should be tested. Namely, we have to check how insensitive 
or sensitive the final optimized parameters are to the solvation theory used. Finally, 
because the parameter space we optimize may be large, we want to use a more powerful 
optimization algorithm than simulated annealing. Generalized-ensemble algorithm [Oj 
may be one choice of such optimization methods. 
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Table 1: Partial-charge parameters of alanine 



Atom 


parm96 


optimized 


N 


-0.4157 


-0.3317 


CA 


0.0337 


0.0628 


C 


0.5973 


0.4996 


HN 


0.2719 


0.2358 





-0.5679 


-0.5533 


HA 


0.0823 


0.0923 


CB 


-0.1825 


-0.0394 


HB 


0.0603 


0.0112 


Total 


0.0000 


-0.0003 



Table 2: Partial-charge parameters of glutamic acid 



Atom 


parm96 


optimized 


N 


-0.5163 


-0.4207 


CA 


0.0397 


0.0642 


C 


0.5366 


0.4635 


HN 


0.2936 


0.2618 





-0.5819 


-0.6054 


HA 


0.1105 


0.1255 


CB 


0.0560 


0.1215 


HB 


-0.0173 


-0.0387 


CG 


0.0136 


-0.0724 


HG 


-0.0425 


-0.0307 


CD 


0.8054 


0.8307 


OE 


-0.8188 


-0.8149 


Total 


-1.0000 


-0.9999 
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Table 3: Partial- charge parameters of tyrosine 



At om 


parm96 


optimized 


N 


—0.4157 


—0.3492 


CA 


— n DDI 4 


0437 


C 




5467 


HN 


0.2719 


0.2480 





—0.5679 


—0.5705 


HA 


0.0876 


0969 


CB 


—0.0152 


0.0673 


HB 


0.0295 


-0.0098 


CG 


-0.0011 


-0.2136 


CD 


-0.1906 


0.0974 


HD 


0.1699 


0.0636 


CE 


-0.2341 


-0.3145 


HE 


0.1656 


0.1376 


CZ 


0.3226 


0.2875 


OH 


-0.5579 


-0.5024 


HH 


0.3992 


0.3968 


Total 


0.0000 


-0.0002 



Table 4: Backbone torsion-energy parameters 



Dihedral angle 


Vi/2 


V2/2 


V3/2 


(j) (parm96) 


0.850 


0.300 


0.0 


(optimized) 


1.209 


0.380 


-0.018 


ip (parm96) 


0.850 


0.300 


0.0 


ip (optimized) 


0.133 


0.578 


0.050 
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Retrieve A' native structures ( one structure per protein \ Irom PDB 



Add hydrogen atoms iTniH availtible in PDB 



3. litfine eacli structure in 2, by minimizing the total ptitential energy 
(with the optimised farce-Held pcirameters if already nptimi/tid) 
U'iih respect lo their coordinate.^ with predefined consirainis on coordinates 



4. Optimize the first set of foree-tlcid parameters by minimizing f in Eq. (7) (calculated 
from the refined structures obtained in 3, J with respect to tltese tlr^t set of parameters 



5. Refine each struct tire in 2. by minimizing ihe Lotjjl polenlial energy 

(with the optimized force-field paratneters) widi respect to their coordinates 
with predefined constraints on coordinates 



6, OptiiTi 1 7;e the second set of force- field pi* n. i '■ ni " if' ~ii f ' ,iT 'i .f 
from the refined .sirtictures obtained in 5.) with respect to these second set of parameters 




New fbrce-tleld parameters 



Figure 1: The flowchart of our method for the optimization of force-fleld parameters. 
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15 r 




( ta) Coordinate Number (d) Coordinate Number 



Figure 2: (a) Superposition of a part of the original PDB structure (PDB code: 2LIS) 
with hydrogen atoms added and the corresponding part in the refined structure. These 
structures were obtained in Steps 2 and 3 of the flowchart in Fig. [TJ (b) Components of 
the force acting on the atoms in the original structure of 2LIS (red dotted curve) and its 
refined structure (blue solid curve). The structures are the same as those in (a). The 
force field is the original AMBER parm96. (c) Components of the force acting on the 
atoms in the refined structure of 2LIS. The force field is the original AMBER parm96 (red 
dotted curve) and AMBER parm96 with the partial-charge parameters optimized (blue 
solid curve), (d) Components of the force acting on the atoms in the refined structure of 
2LIS. The force field is the original AMBER parm96 with the partial-charge parameters 
optimized (red dotted curve) and AMBER parm96 with the partial-charge and torsion- 
energy parameters optimized (blue solid curve). In (b), (c), and (d), only 200 components 
are shown. 
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Figure 3: Time series of MC simulated annealing simulations in force-field parameter 
space for partial-charge (a) and torsion-energy (b) parameters. 
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Figure 4: The lowest-energy conformations of C-peptide obtained by each of the simulated 
annealing MD simulations using the original AMBER parm96 (a) and the optimized force 
field (b). The conformations are ordered in the increasing order of energy. The figures 
were created with RasMol |H5j . 
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(b) 



Figure 5: The lowest-energy conformations of G-peptide obtained by each of the simulated 
annealing MD simulations using the original AMBER parm96 (a) and the optimized force 
field (b). The conformations are ordered in the increasing order of energy. The figures 
were created with RasMol |H5j . 
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